Thai grapheme-to-phoneme using probabilistic GLR parser
نویسندگان
چکیده
Many difficulties in the Thai language such as the absence of boundary word, linking syllables in pronunciation, and homographs are challenging us in developing a Thai Grapheme-to-Phoneme (G2P) converter. Presently there are a couple Thai G2P systems which are proposed in ruled-based and decision-tree approach. The rule-based approach has a drawback in the limitation of employing the context. The decision-tree approach is somehow able to capture the local context for making the decision. On the contrary, the Probabilistic Generalized LR (PGLR) approach is reported that both the global and local context are efficiently captured in the probabilistic model. In this paper, we implement a Thai G2P system based on the PGLR approach. The result of experiment shows 90.44% of word accuracy in case of ignoring vowels length and 72.87% of word accuracy in case of exact match evaluation. These results are superior to those of rule-based and decision-tree approaches.
منابع مشابه
Example-based grapheme-to-phoneme conversion for Thai
Several characteristics of the Thai writing system make Thai grapheme-to-phoneme (G2P) conversion very challenging. In this paper, we propose an Example-Based Grapheme-toPhoneme conversion approach. It generates the pronunciation of a word by selecting, modifying and combining pronunciations from syllables from training corpus. The best system achieves 80.99% word accuracy and 94.19% phone accu...
متن کاملExample-Based Grapheme-to-Phon
Several characteristics of the Thai writing system make Thai grapheme-to-phoneme (G2P) conversion very challenging. In this paper, we propose an Example-Based Grapheme-toPhoneme conversion approach. It generates the pronunciation of a word by selecting, modifying and combining pronunciations from syllables from training corpus. The best system achieves 80.99% word accuracy and 94.19% phone accu...
متن کاملThai Grapheme-Based Speech Recognition
In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of sharing parameters across models. We com...
متن کاملIntegrating Thai grapheme based acoustic models into the ML-MIX framework - for language independent and cross-language ASR
Grapheme based speech recognition is a powerful tool for rapidly creating automatic speech recognition (ASR) systems in new languages. For purposes of language independent or cross language speech recognition it is necessary to identify similar models in the different languages involved. For phoneme based multilingual ASR systems this is usually achieved with the help of a language independent ...
متن کاملImproving grapheme-based ASR by probabilistic lexical modeling approach
There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the relationship between acoustic feature observations and grapheme states may not be always trivial. It usual...
متن کامل